Main analysis: Green space and gun shooting incidents
This is our descriptive analysis and visualization of our main exposure and outcome of interest for this descriptive analysis: parks and gun shooting incidence.
parks_df =
read_csv("Data Folder/Parks_Properties_Oct2025.csv", na = c("NA", ".", "")) |>
janitor::clean_names() |>
select(acquisitiondate, borough, zipcode, address,
eapply, location, acres, typecategory, retired) |>
rename(boro = borough) |>
drop_na(address) |>
mutate(
boro = case_when(
boro == "B" ~ "Brooklyn",
boro == "M" ~ "Manhattan",
boro == "Q" ~ "Queens",
boro == "R" ~ "Staten Island",
boro == "X" ~ "Bronx",
TRUE ~ NA_character_
),
address = paste(address, "New York, NY", sep = ", ")
) |>
filter(boro == "Manhattan")
shooting25_df =
read_csv("Data Folder/Shooting_2025.csv", na = c("NA", ".", "")) |>
janitor::clean_names() |>
distinct(incident_key, .keep_all = TRUE) |>
filter(boro == "MANHATTAN") |>
drop_na(latitude) |>
mutate(statistical_murder_flag = case_match(statistical_murder_flag,
"N" ~ FALSE,
"Y" ~ TRUE))
shooting06_24_df =
read_csv("Data Folder/Shooting_Historic.csv", na = c("NA", ".", "")) |>
janitor::clean_names() |>
distinct(incident_key, .keep_all = TRUE) |>
mutate(
date_obj = lubridate::mdy(occur_date),
created_year = lubridate::year(date_obj))|>
filter(boro == "MANHATTAN",
!incident_key %in% c(279138121, 272105041)) |>
drop_na(latitude)
merged_shooting_df <- bind_rows(shooting25_df, shooting06_24_df)
Shooting incident data sets (2025 and Historic) and Park data set
Data sets, data sources, and exclusion/inclusion criteria
We have 2 data sets of shooting incidents in New York City from NYC Open Data.
shooting25_dfis of shooting incidents that occurred in 2025 thus far (1/1/2025 ~ 09/30/2025).shooting06_24_dfincludes all historic shootings that occurred from 01/01/2006 to 12/31/2024. Both data sets of shooting incidents were filtered to exclude missing latitude/longitude and duplicate incidents, as in the case of multiple perpetrators or victims. We also filtered the borough (boro) to shooting incidents that occurred in Manhattan, as this is our location of interest. After mapping, incident_key = 279138121 and 272105041 inshooting06_24_dfwere excluded because their latitudes/longitudes don’t match with the borough.The
merged_shooting_dfis the merged file of both shooting data sets (shooting25_dfandshooting06_24_df).parks_dfidentifies property managed partially or solely by NYC Parks. We dropped theNAsof parks without addresses, as this is a key component of my green space analysis.
### Load Manhattan ACS population data
# Download Manhattan population data with geometry
manhattan_pop <- get_acs(
geography = "tract",
variables = "B01003_001",
state = "NY",
county = "061",
geometry = TRUE,
year = 2023
)
### `county = "061"` indicates Manhattan
### `variables = "B01003_001"` indicates that we are requesting the total population estimate for each census tract from the ACS 5-year survey. This variable provides a count of all residents each census tract, which is useful for mapping population distribution and describing spatial relationships with parks and shooting incidents.
### Geocode parks
# 1. Load parks data and geocode according to census
library(tidygeocoder)
parks_geo <- parks_df |>
geocode(address, method = "census", lat = latitude, long = longitude)
### `geocode()` took `address` and gave the approximate latitude (`latitude`) and longitude (`longitude`) according to the Census geocoding API. In this way, We am able to locate the approximate location of each park.
# 2. Remove rows where geocoding failed
parks_geo_clean <- parks_geo |>
filter(!is.na(latitude) & !is.na(longitude))
# 3. Convert to sf
parks_sf <- st_as_sf(parks_geo_clean,
coords = c("longitude", "latitude"),
crs = 4326)
### We converted the parks data frame to an sf object (spatial data frame) using `st_as_sf()`. In this way, each row is registered as point geometry based on its latitude and longitude.
# 4. Load Census tracts for NY
tracts <- tracts("NY", year = 2023, class = "sf")
### We loaded the 2023 Census tract polygons for all of New York State; these polygons have GEOIDs and boundaries from the Census.
# 5. Make sure CRS match
parks_sf <- st_transform(parks_sf, st_crs(tracts))
### We transformed the CRS (Coordinate Reference System) with `st_transform()` to match the CRS of the parks data to the ACS polygons in Manhattan (`manhattan_pop`). Matching CRS is necessary so that the layers align properly on maps.
# 6. Spatial join to attach GEOID
parks_with_geoid <- st_join(parks_sf, tracts[, "GEOID"])
- We removed parks where the geocoding failed, as this means that there is something incorrect about the address. Because we do not know if other location-based variables are incorrect, we have removed these parks from the data set.
### Convert each shooting dataset to sf and ensure CRS match
### We don't need to geocode shooting data, since the latitude and longitude are already given. We just converted the shooting data frames to an sf object (spatial data frame) using `st_as_sf()`. In this way, each row is registered as point geometry.
### Then, We transformed the CRS (Coordinate Reference System) with `st_transform()` to match the CRS of the shooting data to the ACS polygons in Manhattan (`manhattan_pop`). Matching CRS is necessary so that the layers align properly on maps
# 2025 shootings
shootings25_sf <- shooting25_df |>
st_as_sf(coords = c("longitude", "latitude"), crs = 4326) |>
st_transform(st_crs(manhattan_pop))
# Historic shootings (2006–2024)
shootings06_24_sf <- shooting06_24_df |>
st_as_sf(coords = c("longitude", "latitude"), crs = 4326) |>
st_transform(st_crs(manhattan_pop))
# Combined shootings
merged_shootings_sf <- merged_shooting_df |>
st_as_sf(coords = c("longitude", "latitude"), crs = 4326) |>
st_transform(st_crs(manhattan_pop))
# Calculate shootings per acre of park for census tracts with green space
### shootings_sf → latitude/longitude locations of shootings, sf object
### parks_sf → latitude/longitude locations of parks with acreages, sf object
### acs_polygons → census tract boundaries for Manhattan, sf object
# Function for calculate_shootings_per_acre
calculate_shootings_per_acre <- function(shootings_sf, parks_sf, acs_polygons) {
# Assign shootings to census tracts in Manhattan
shootings_with_geoid <- st_join(shootings_sf, acs_polygons[, "GEOID"], join = st_within)
# Aggregate total park area per census tract according to GEOID
park_area_by_tract <- parks_with_geoid |>
st_drop_geometry() |>
group_by(GEOID) |>
summarize(total_acres = sum(acres, na.rm = TRUE)) |>
as.data.frame()
# Count shootings per census tract
shootings_by_tract <- shootings_with_geoid |>
st_drop_geometry() |>
group_by(GEOID) |>
summarize(num_shootings = n()) |>
as.data.frame()
# Merged file of park and shooting location data with ACS polygons of Manhattan
tracts_data <- acs_polygons |>
left_join(park_area_by_tract, by = "GEOID") |>
left_join(shootings_by_tract, by = "GEOID") |>
mutate(total_acres = replace_na(total_acres, 0),
num_shootings = replace_na(num_shootings, 0),
shootings_per_acre = if_else(total_acres > 0,
num_shootings / total_acres,
NA_real_)
)
return(tracts_data)
}
### Calculate shootings per acre of park in census tracts with green space for each shooting file
# 2025 shootings
tracts_data_2025 <- calculate_shootings_per_acre(shootings25_sf,
parks_with_geoid, manhattan_pop)
# Historic shootings (2006–2024)
tracts_data_hist <- calculate_shootings_per_acre(shootings06_24_sf,
parks_with_geoid, manhattan_pop)
# Combined shootings
tracts_data_merged <- calculate_shootings_per_acre(merged_shootings_sf,
parks_with_geoid, manhattan_pop)
Map shooting incidents per acre of park in each census tract with green space
- Note: Census tracts without park land are in
NA. Because approximately 40% of parks could not be geocoded and our spatial analysis relied on single point locations (rather than full park polygons), many census tracts appear to have no park acreage. Including these zero-acre tracts in the park-acreage map would visually dominate the color scale with zeros and obscure meaningful variation among tracts that do have park land. Therefore, setting these tracts toNAallows the map to highlight only census tracts where park acreage could be identified.
# 2025 shootings
mapview(
tracts_data_2025,
zcol = "shootings_per_acre",
legend = TRUE,
layer.name = "Shootings 2025 per Acre",
map.types = "CartoDB.Positron",
col.regions = viridis::viridis(10)
)
<<<<<<< HEAD
=======
>>>>>>> 51d473b11348127cfedb6d71295ae83d0edb1dc7
# Historic shootings
mapview(
tracts_data_hist,
zcol = "shootings_per_acre",
legend = TRUE,
layer.name = "Historic Shootings per Acre",
map.types = "CartoDB.Positron",
col.regions = viridis::viridis(10)
)
<<<<<<< HEAD
=======
>>>>>>> 51d473b11348127cfedb6d71295ae83d0edb1dc7
# Merged shootings
mapview(
tracts_data_merged,
zcol = "shootings_per_acre",
legend = TRUE,
layer.name = "All Shootings per Acre",
map.types = "CartoDB.Positron",
col.regions = viridis::viridis(10)
)
<<<<<<< HEAD
=======
>>>>>>> 51d473b11348127cfedb6d71295ae83d0edb1dc7
Interpretation
- For census tracts with park acreage, the highest densities of shootings per park acre were concentrated in Harlem and Upper Manhattan/West Harlem across all datasets. In the 2025 shooting dataset, Census Tract 287 in Inwood exhibited a noticeably higher shooting per acre density compared to its surrounding tracts. This is an elevation not observed in data sets of previous years. This spike in 2025 may reflect incomplete data coverage, as the 2025 dataset includes incidents only through September. As shown in our time analysis, shooting incidents typically rise later in the year, particularly during the winter months, which may explain this apparent discrepancy. The pattern could reflect either incomplete data capture or a genuine increase in shootings in this tract. Without the complete data or comparisons between seasonality across time, we cannot distinguish between these hypotheses.
Map density of parks (total park acreage) in each census tract
Limitations
- This map reflects a key limitation of our spatial analysis. Out of 394 parks in Manhattan, only 273 locations (69%) had addresses in the original dataset. An additional 30 addresses were dropped because the Census library could not successfully geocode them. Consequently, only 61% of park addresses could be geocoded to identify their corresponding census tract.
- Our spatial analysis was further limited because we could only assign a single latitude and longitude point for each park, corresponding to its main address. In reality, this approach likely overestimated and underestimated the total park acreage within each census tract (depending on how the park boundaries fall in reality), as it does not account for the physical area of parks that may span multiple tracts. Parks that could not be geocoded underestimated the analysis.
- As stated in the beginning of this section, the points above are the reason that we excluded census tracts without parks in this description of shooting incident density.
# Aggregate total park area per tract
park_area_by_tract <- parks_with_geoid |>
st_drop_geometry() |>
group_by(GEOID) |>
summarize(total_acres = sum(acres, na.rm = TRUE)) |>
as.data.frame()
tracts_parks <- manhattan_pop |>
left_join(park_area_by_tract, by = "GEOID") |>
mutate(total_acres = if_else(total_acres > 0, total_acres, NA_real_))
mapview(
tracts_parks,
zcol = "total_acres",
legend = TRUE,
layer.name = "Park Acreage per Tract",
map.types = "CartoDB.Positron",
col.regions = viridis::viridis(10)
)
<<<<<<< HEAD
=======
>>>>>>> 51d473b11348127cfedb6d71295ae83d0edb1dc7
# Aggregate total park area per tract
park_area_by_tract <- parks_with_geoid |>
st_drop_geometry() |>
group_by(GEOID) |>
summarize(total_acres = sum(acres, na.rm = TRUE)) |>
as.data.frame()
tracts_parks <- manhattan_pop |>
left_join(park_area_by_tract, by = "GEOID") |>
mutate(total_acres = if_else(total_acres > 0, total_acres, NA_real_))
mapview(
tracts_parks,
zcol = "total_acres",
legend = TRUE,
layer.name = "Park Acreage per Tract",
map.types = "CartoDB.Positron",
col.regions = viridis::viridis(10)
)
<<<<<<< HEAD
=======
>>>>>>> 51d473b11348127cfedb6d71295ae83d0edb1dc7
Interpretation
- Interestingly, when comparing the census tracts with the highest park acreage to the map of shootings per acre, these tracts tend to have lower shootings per acre relative to surrounding tracts with smaller park acreage. Despite the limitations noted above, this suggests a descriptive pattern where greater park presence is associated with lower shooting density.

